64 research outputs found
SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields
3D reconstruction from 2D image was extensively studied, training with depth
supervision. To relax the dependence to costly-acquired datasets, we propose
SceneRF, a self-supervised monocular scene reconstruction method using only
posed image sequences for training. Fueled by the recent progress in neural
radiance fields (NeRF) we optimize a radiance field though with explicit depth
optimization and a novel probabilistic sampling strategy to efficiently handle
large scenes. At inference, a single input image suffices to hallucinate novel
depth views which are fused together to obtain 3D scene reconstruction.
Thorough experiments demonstrate that we outperform all recent baselines for
novel depth views synthesis and scene reconstruction, on indoor BundleFusion
and outdoor SemanticKITTI. Our code is available at
https://astra-vision.github.io/SceneRF.Comment: Project page: https://astra-vision.github.io/SceneR
Vision for Scene Understanding
This manuscript covers my recent research on vision algorithms for scene understanding, articulated in 3 research axes: 3D Vision, Weakly supervised vision, and Vision and physics. At the core of the most recent works is weakly-supervised learning and physics-embodied vision, which address short comings of supervised learning that requires large amount of data. The use of more physically grounded algorithms appears evidently beneficial as both robots and humans naturally evolve in a 3D physical world. On the other hand, accounting for physics knowledge reflects important cue about lighting and weather conditions of the scene central in my work. Physics-informed machine learning is not only beneficial for increased interpretability but also to compensate labels and data scarcity
Detection of Unfocused Raindrops on a Windscreen using Low Level Image Processing
International audienceIn a scene, rain produces a complex set of visual effects. Obviously, such effects may infer failures in outdoor vision-based systems which could have important side-effects in terms of security applications. For the sake of these applications, rain detection would be useful to adjust their reliability. In this paper, we introduce the problem (almost unprecedented) of unfocused raindrops. Then, we present a first approach to detect these unfocused raindrops on a transparent screen using a spatio-temporal approach to achieve detection in real-time. We successfully tested our algorithm for Intelligent Transport System (ITS) using an on-board camera and thus, detected the raindrops on the windscreen. Our algorithm differs from the others in that we do not need the focus to be set on the windscreen. Therefore, it means that our algorithm may run on the same camera sensor as the other vision-based algorithms
Model-based occlusion disentanglement for image-to-image translation
Image-to-image translation is affected by entanglement phenomena, which may
occur in case of target data encompassing occlusions such as raindrops, dirt,
etc. Our unsupervised model-based learning disentangles scene and occlusions,
while benefiting from an adversarial pipeline to regress physical parameters of
the occlusion model. The experiments demonstrate our method is able to handle
varying types of occlusions and generate highly realistic translations,
qualitatively and quantitatively outperforming the state-of-the-art on multiple
datasets.Comment: ECCV 202
LMSCNet: Lightweight Multiscale 3D Semantic Completion
We introduce a new approach for multiscale 3D semantic scene completion from
sparse 3D occupancy grid like voxelized LiDAR scans. As opposed to the
literature, we use a 2D UNet backbone with comprehensive multiscale skip
connections to enhance feature flow, along with 3D segmentation heads. On the
SemanticKITTI benchmark, our method performs on par on semantic completion and
better on completion than all other published methods - while being
significantly lighter and faster. As such it provides a great performance/speed
trade-off for mobile-robotics applications. The ablation studies demonstrate
our method is robust to lower density inputs, and that it enables very high
speed semantic completion at the coarsest level. Qualitative results of our
approach are provided at http://tiny.cc/lmscnet.Comment: For a demo video, see http://tiny.cc/lmscne
COARSE3D: Class-Prototypes for Contrastive Learning in Weakly-Supervised 3D Point Cloud Segmentation
Annotation of large-scale 3D data is notoriously cumbersome and costly. As an
alternative, weakly-supervised learning alleviates such a need by reducing the
annotation by several order of magnitudes. We propose COARSE3D, a novel
architecture-agnostic contrastive learning strategy for 3D segmentation. Since
contrastive learning requires rich and diverse examples as keys and anchors, we
leverage a prototype memory bank capturing class-wise global dataset
information efficiently into a small number of prototypes acting as keys. An
entropy-driven sampling technique then allows us to select good pixels from
predictions as anchors. Experiments on three projection-based backbones show we
outperform baselines on three challenging real-world outdoor datasets, working
with as low as 0.001% annotations
ManiFest: Manifold Deformation for Few-shot Image Translation
Most image-to-image translation methods require a large number of training
images, which restricts their applicability. We instead propose ManiFest: a
framework for few-shot image translation that learns a context-aware
representation of a target domain from a few images only. To enforce feature
consistency, our framework learns a style manifold between source and proxy
anchor domains (assumed to be composed of large numbers of images). The learned
manifold is interpolated and deformed towards the few-shot target domain via
patch-based adversarial and feature statistics alignment losses. All of these
components are trained simultaneously during a single end-to-end loop. In
addition to the general few-shot translation task, our approach can
alternatively be conditioned on a single exemplar image to reproduce its
specific style. Extensive experiments demonstrate the efficacy of ManiFest on
multiple tasks, outperforming the state-of-the-art on all metrics and in both
the general- and exemplar-based scenarios. Our code is available at
https://github.com/cv-rits/Manifest .Comment: ECCV 202
Influence of Fog on Computer Vision Algorithms
This technical report describes a new preliminary approach to simulate fog in images using accurate physical and photometric models to study the influence of small particles on computer vision algorithms
- …